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ABSTRACT 



The advent of large instantaneous bandwidth receivers and high spectral resolution spectrometers on (sub-)millimeter telescopes has 
opened up the possibilities for unbiased spectral surveys. Because of the large amount of data they contain, any analysis of these 
surveys requires dedicated software tools. Here we present an extension of the widely used CLASS software that we developed to 
that purpose. This extension, named Weeds, allows for searches in atomic and molecular lines databases (e.g. JPL or CDMS) that 
may be accessed over the internet using a virtual observatory (VO) compliant protocol. The package permits a quick navigation 
across a spectral survey to search for lines of a given species. Weeds is also capable of modeling a spectrum, as often needed for 
line identification. We expect that Weeds will be useful for analyzing and interpreting the spectral surveys that will be done with the 
HlFl instrument on board Herschel, but also observations carried-out with ground based millimeter and sub-millimeter telescopes and 
interferometers, such as lRAM-30m and Plateau de Bure, CARMA, SMA, eVLA, and ALMA. 

Key words. ISM: molecules - ISM: lines and bands - Line: identification - Methods: data analysis - Virtual observatory tools 



1. Introduction 

A spectral survey consists in a series of spectra covering a signif- 
icant spectral domain. At (sub-)millimeter wavelengths, a spec- 
tral survey typically covers several tens of GHz. Spectral surveys 
are generally referred to as unbiased if they provide a complete 
coverage with a uniform sensitivity. As such, they allow for a 
complete census of the species emitting in that band, and some- 
times for discovery of new interstellar species. In addition, be- 
, cause a given band often contains many transitions of the same 
' species, the simultaneous analysis of all these lines provides 
' stringent constraints on the physical conditions in the emitting 
gas, such as the density and temperature. Therefore spectral sur- 
■ veys are very useful for characterizing both the chemical com- 
position and physical condition in the o bserved objects. 

Ever since the pioneering work of [Johansson et al.] (Il984l) . 
who carried-out an unbiased spectral survey of the Orion KL 
, star-forming region and IRC -1-10216 carbon-rich star be- 
' tween 72 and 91 GHz with the Onsala telescope, many spec- 
' tral surveys have been carried-out at millimeter and sub- 
millimeter wavelengths u sing ground-based telescopes (see 
iHerbst & van Disho eck 2009, for a review). Because of the lim- 
ited sensitivity of the instruments available at that time, early 
spectral surveys were targeted at bright star-forming regions, 
such as Orion KL and Sgr B2 in the millimeter range. Thanks 
to the increasing sensitivity of heterodyne receivers and the 
availability of sub-millimeter telescopes, these surveys were 
later extended to higher frequencies (e.g. ISchiIke et al.l 1 19971 
l2001l;IComitoetal.ll2005l) and carried-out towards fainter young 
stellar ob jects (e.g. NGC 1333 IRA S4 or IRAS 16292-2422; 
iBlake et a l. 1994; van Dis hoeck et al. 1995; Blake et al. 1995). 
A few spectral surveys have been carried with millimeter and 
sub-millimeter interferometers, such as OVRO or the SMA (e.g. 



IB lake et all 119961 iBeuther et al.l l2006l) . The HIFI instrument 
(Ide Graauw et al. 2010) onboard the Herschel space observa- 
tory dPilbratt et al.li2010 ) now allows for a complete coverage 
of the almost unexplored 480-1250 and 1410-1910 GHz fre- 
quency bands. Its large spectral coverage - up to 4 GHz in- 
stantaneous bandwidth - and unprecedented sensitivity in this 
frequency range enable astronomers to carry-out spectral sur- 
veys over almost 1.5 THz down to the line confusion limit in 
a few tens of hours. The first spectral surve ys with this instru- 
ment have already given spectacular results (IBergin et al 1'2010'; 
Ceccarelli et al. 2010). Among these, we can cite the richness 
of the Orion BN-KL spe ctrum observed at THz frequencies (see 
Fig. 2'Bergin et aD2()10l) or the discovery of ND in IRAS16293- 
2422 (Bacma nrietalll2010h . 

Current developments in (sub-)millimeter instruments in- 
clude an increase in the instantaneous bandwidth of the detec- 
tion devices. During the past decade, the instantaneous band- 
width of tunable heterodyne receivers has increased by more 
than an order of magnitude, now routinely reaching ~10 GHz. 
Other technologies (e.g. HEMT, FCRAO and IRAM) have al- 
ready provided several tens of GHz, although it is still unclear 
whether the sensitivity of these receivers can match that of SIS 
receivers. This increase in bandwidth has been accomplished in 
parallel with the advent of digital spectrometers (autocorrelators, 
fast Fourier transform), the versatility of which allow the cover- 
age of such bandwidth with a spectral resolution down to a few 
hundred kHz. As a result, unbiased spectral surveys of the 3 mm 
atmospheric window (v = 80 - 1 17 GHz) can be done with the 
IRAM-30m telescope in ~10 hours, with a 2 mK noise at 1 cr in 
2 MHz (~6 km/s) spectral channels. The ALMA interferometer 
will also permits coverage of large frequency w indows, provid 
ing spectral cubes with up to 8 GHz bandwidth (IWoottei ' 
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Thanks to its sensitivity, this instrument will allow, in its com- 
pact configuration, line surveys to be carried-out down to the 
confusion limit toward a large number of sources. Spectral sur- 
veys are thus still in their infancy and will very likely become 
routine observing modes in the coming years. 

Spectral surveys covering large frequency bands require spe- 
cific tools to be analyzed efficiently. In this article, we present a 
software that is intended for the analysis of spectral surveys. In 
§ |2l we briefly describe how such surveys are analyzed. In § [3] 
we detail how our software was designed and implemented to 
carried-out such an analysis. Finally § |4] concludes this article 
and discuss future developments. 

2. Spectral surveys analysis 

The analysis of a spectral survey usually consists in identify- 
ing the various lines and in deriving the physical and chemical 
properties of the emitting gas (density, temperature and column 
densities of the observed species). The main difficulty in such 
identification is that large molecules may have hundreds of lines 
in the (sub-)millimeter range. These species - such as methanol, 
methyl formate or dimethyl ether - are often named weeds by 
spectroscopists. If the lines are too broad, they may overlap and 
blend together, which makes the identific ation of weaker line s 
difficult. This is the line confusion limit jSchilke et al.lll997l) : 
line identification is not limited by the signal-to-noise of the ob- 
servations, but by the line blending. 

Because of this problem, extreme care must be 
taken when identifyin g species from a spectral survey. 
iHerbst & van Dishoeck '(2009) summarize the criteria for a firm 
detection as follows: "f i) Rest frequencies are accurately known 
to 1:10^, either from direct laboratory measurements or from 
a high-precision Hamiltonian model; (ii) observed frequencies 
of clean, nonblended lines agree with rest frequencies for a 
single well-determined velocity of the source; if a source has a 
systematic velocity field as determined from simple molecules, 
any velocity gradient found for lines of a new complex molecule 
cannot be a random function of transition frequency; (Hi) all 
predicted lines of a molecule based on an LTE spectrum at a 
well-defined rotational temperature and appropriately corrected 
for beam dilution are present in the observed spectrum at 
roughly their predicted relative intensities. A single anticoin- 
cidence (that is, a predicted line missing in the observational 
data) is a much stronger criterion for rejection than hundreds of 
coincidences are for identification. This last criterion is one of 
the strongest arguments for complete line surveys rather than 
targeted line searches." 

The rest frequencies needed to fulfill criterion ( i) are usually 
taken from spectral lines catalogs, such as the Cologne D atabase 
for Molecular Spectroscopy (CDMS; M uller et alJl200lh o r the 



JPL Molecular Spectroscopy catalog (iPickettet al.lll998 f). For 
criterion we need to compare the consistency of the cen- 
troid velocities of all the line candidates. Finally criterion (///) 
requires to perform a model of the predicted emission of the 
given species so that it can be compared with the observations. 
The traditional technique for this consist in building a rotational 
diagram (Goldsmith & Langer 1999) to see if all detected lines 
agree with a single rotational temperature and column density. 
Alternatively, one can compute synthetic spectrum and compare 
it dir ectly with the obser vations - a technique called forward fit- 
ting (IComito et al.ll2005 h. This approach is also extremely useful 
when one wants to search for weak lines of a specie among hun- 
dreds from various weeds: a synthetic spectrum of the emission 
of the weeds can be constructed to fit the observed transitions in 



an iterative fashion. Once the brightest lines have been modeled, 
one can compare the synthetic spectrum to the observed one to 
look for lines from less abundant species (see Belloche et al] 
2008, for an example of this technique). Of course, this also al- 
lows the physical and chemical properties of the emitting gas to 
be derived. 

Since spectral surveys may contain thousands of lines, they 
require specific tools to be efficiently analyzed. Two pack- 
ages have been developed for that purpose. The first of them, 
XCLASS dSchilke etal.ll200lh . is an extension of the widely 
used CLASS data reduction software, which is part of Gildas. 
XCLASS contains a spectral line database which is built from 
file CDMS and JPL catalogs. Technically, it uses the MySQL 
database server which must be installed on the user com- 
puter This database may be updated manually, by replacing 
the database file by the one provided by the program authors. 
XCLASS allows the user to look for lines corresponding to a 
given frequency in its catalog, but also to make a model at the 
LTE of the observed spectra. XCLASS has been successfully 
used to reduce severa l spectral su rveys obtained with the CSO 
and flie IRAM-30ni dSchilke et al. 2001; Comito et al. 200l 
Befloch e etani2008h . However, XCLASS is based on an ob- 
solete version of CLASS, which is not maintained anymore. 
Indeed, the CLASS internal structures was largely rewritten in 
2005-2006 to adapt to the challenges of data reductions com- 
ing w ith the recent generation of receivers (Hilv-Blant et aT| 
[lOOSh . The second package, CASSIS, has been developed pri- 
marily to analyze Herschel-HIFI spectral surveys, although it 
can be used to analyze surveys from ground based telescopes 
as well. CASSIS itself does not have data reduction capabilities; 
therefore data must first be reduced in another software such as 
CLASS or HIPE (Ott et al., in prep.) before analysis in CASSIS. 
CASSIS uses a database which is built from the CDMS and the 
JPL catalog; in recent CASSIS versions, this database (SQLite) 
is embedded in the program so that an external database server is 
no longer required. Like XCLASS, CASSIS allows the forward- 
fitting of a spectrum, but also the search for the various transi- 
tions of a given specie. 

3. Weeds design and implementation 

3.1. General design 

Weeds has been designed specifically to analyze spectral sur- 
veys, following the approach presented in § |2l Although its de- 
velopment was inspired by the XCLASS and CASSIS packages, 
it is different in several aspects. Weeds is an extension of the 
current version of the CLASS software, and is mostly written 
in Python language, except for a few command written in the 
Gildas command interpreter (SIC) language. To do this. Weeds 
uses the new possibility offere d by GILDA S to interleave Python 
and SIC in the same session (IBardeau et aLil2010l) . In particular, 
the variable contents are shared between Python and SIC. Python 
has several advantages over other languages for developing such 
extensions. It benefits from a large library of modules that al- 
low complex tasks - such as making a query in a VO-compliant 
database, see § 13.21 - to be done relatively easily. Although it 
is interpreted, it is still computationally efficient, because criti- 
cal modules (e.g. the module for array computations that we use 
for the spectra modeling, see § I3.4| | are written in compiled lan- 
guages such as C or Fortran. Weeds is distributed with Gildas 
since April 2010. The source code is freely available from the 
IRAM websiteQ. A user manual is also available on that page. 
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Because Weeds is an extension of CLASS, it can be used to 
analyze any data format that CLASS supports. In practice, the 
CLASS data format is used by many ground-based telescopes 
(e.g. IRAM-30m, CSO and APEX). Data from other telescopes 
can be converted to FITS format and imported into CLASS as 
well. For example, Herschel-HIFI can be imported into CLASS 
through the FITS filler delivered by the HIPE data reduction 
software (Delforge et al., in prep.). In order to analyze data in 
Weeds, the data must have been calibrated and reduced first. The 
reduction usually consists in flagging the bad channels, averag- 
ing the scan covering the same frequency range together, and 
removing a polynomial baseline. If the data were obtained with 
double sideband (DSB) receiver, sideband deconvolution might 
be needed in order to produce a SSB spectrum. This requires a 
special observing technique, i.e. a number of overlapping spec- 
tra with shifted local oscillator frequency. Deconvolution can 
then be p erformed in CLASS using the algorithm developed by 
IComito & Schilke (2002.) . Thus data reduction and analysis can 
be done within the same environment. 



3.2. Spectral line catalogs queries 

As mentioned above, line identification requires repeated queries 
to spectral lines catalogs, such as the CDMS or the JPL. Unlike 
XCLASS and CASSIS - who require a custom catalog installed 
on the user's computer - Weeds performs queries in spectral 
line databases through the Internet- This has the advantage of 
not requiring any update of a custom catalog: changes in the 
database, such as species addition or line frequency corrections 
or updates, are readily available in Weeds. In order to make 
queries in spectral lines catalogs, we have implemented the VO- 
compliant Simple Line Access Protocol (SLAP; Salga do et alJ 
12009 ) in Weeds. This protocol allows spectral line databases 
queries to be made in a standardized way; any database that 
implements the protocol can be accessed by Weeds. Because it 
is a VO standard, it is likely that more and more spectral line 
database will use it in the future. Nonetheless, as of this writ- 
ing only the CDMS is accessible using t hat protocol, throug h 
an interface at the Paris VO Observatory (iMoreau et al. 1 120081) . 
Therefore, in order to access the JPL catalog from Weeds, we 
have implemented queries in the specific protocol which is used 
by this database. The CDMS can be accessed through its own 
protocol as well. 

For the moment, only one database can be used at a time; it 
is not possible to combine the catalogs, i.e. to use species some 
out the JPL and some out the CDMS. In the future, the VAMDC 
project will provide a single, unified database, including state- 
of-art spectroscopic data from both the CDMS and the JPL cat- 
alogs. We plan on implementing an access to this database from 
Weeds as soon as it it released. 

From the user point of view. Weeds provides a command 
to search for lines corresponding to a given frequency range 
in a spectral line catalog. The user can select a region on the 
spectrum displayed in CLASS, and the command prints all the 
lines from the catalog around the region selected. The lines can 
be filtered out on the basis of the species they belong to, their 
Einstein coefficient, or their upper level energy. For double side- 
band spectra, a command option allows the search for fines from 
the image band. 



3.3. Lines browsing/identification 

To secure the detection of a species in a spectral survey, one 
needs, according to criterion ( ii) to search for all the transitions 
of that species in the entire frequency range covered by the sur- 
vey. One also needs to measure the velocity of each line to check 
that they correspond to a single velocity. Weeds allows the user 
to browse through a survey rapidly. For this. Weeds has a com- 
mand to search for all the lines of a given specie that fall in the 
frequency range covered by the survey. The command prints the 
lines in the terminal, but also builds an internal index containing 
all these lines, that we can order either by increasing frequency 
or increasing upper level energy. Another command allow the 
user to examine each of the line candidate one by one, to see 
if the line is detected or not. This command makes a zoom on 
a small frequency region around the (expected) line, and also 
sets the velocity scale with respect to the rest frequency of the 
line. A vertical mark is also drawn on the displayed spectrum at 
the source velocity, so that we can easily determine if the line 
is detected or not. A Gaussian fit of the observed line may be 
performed to determine the velocity of each line. 



3.4. Spectra modeling 

Once several transitions of a given specie have been found, one 
needs to check if the relative intensities of these components 
agree with a single excitation temperature (criterion ( Hi)). In ad- 
dition, one needs to make sure that non-detected lines are consis- 
tent with the excitation temperature derived from other species 
- or in other words, that no lines are "missing". For this Weeds 
allows the user to compute a synthetic spectrum that can be com- 
pared directly with the observations (forward-fitting). Fol lowing 
the ap proach used in XCLASS and described in Comito et al.l 
(l2005h the synthetic spectrum is computed assuming that the 
emission arises from one or several components at the LTE. 
Although this approximation is simplistic - it is well known that 
in the interstellar medium species are often out of local ther- 
modynamic equilibrium, and many sources are known to have 
density and temperature gradients - yet such a zeroth-order ap- 
proach is often extremely useful to identify lines, as mentioned 
above. Once the lines have been identified, a more realistic mod- 
eling, taking into account non-LTE excitation effects as well as 
the source structure, can be carried-out. 

Under these assumptions, and after baseline subtraction, the 
brightness temperature of a given species as a function of the rest 
frequency v is given by: 



Tb (v) = ^ [Jv (T^ex) - Jv [t^s]] (l - 



(1) 



where rj is beam dilution factor, which, for a source with a 
Gaussian brightness profile and a Gaussian beam, is equal to: 



^ However, Weeds can make a cache of part or an entire catalog, so 
that it can be used later with no Internet connection. 
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where 0, and 0, the source and telescope beam FWHM sizes, 
respectively. For a sake of simplicity, the latter is assumed to be 
given by the diffraction limij^ 



9, = 1.22 — 
vD 



(3) 



where c is the light speed and D is the diameter of the telescope. 
Tbg is the brightness temperature of the background emission, 
i.e. the physical temperature that would have a black body pro- 
ducing the same background continuum emission (e.g. 2.73 K 
for the cosmic microwave background). Tex is the excitation tem- 
perature, and the opacity t (v) is: 



T(y) 



(4) 



where the summation is over each line of the considered species. 
Here A^tot is the total column density of the species considered, 
Q{Tex) is the partition function. A' is the Einstein coefficient of 
the ; line, gj, and E„ are the upper level degeneracy and energy of 
the / line, and 0' is the / line profile function. The latter is given 
by: 



1 



■V2^ 



(5) 



where Vg the is / line rest frequency and cr the line width in fre- 
quency units at 1/e. cr can be expressed as a function of the line 
FWHM in velocity units AV as follows: 



cr - 



•V8Tn2 



zAV 



(6) 



Note that some of the model parameters may be degenerate in 
certain cases. In the optically thick or thin limits, the source size 
and temperature or the size and column density are degenerate, 
respectively (see Eq. [T]and[3l). This degeneracy can be usually 
lifted if both thick and thin lines are present in the survey, or 
if lines from an rare isotopologue are detected together with the 
main one (e.g. '^CH30H and CH3OH). The source size may also 
be constrained from interferometric observations. 

Several components with e.g. different kinetic temperature or 
column density can be included in the computation. For this, we 
assume that the various components are not coupled radiatively 
- that is a photon from one component can not be absorbed by 
a another, foreground component - in which case the emerg- 
ing spectrum is simply the sum of the brightness temperature of 
each components given by Eq. ([TJ. Each of these component can 
be Doppler-shifted with respect to each other, which is useful 
when modeling sources with several components at different ve- 
locities. It is also possible to compute the spectra from several 
species; this is done by a summation of Eq. ([T]) over each specie. 

The column densities, kinetic temperatures, Doppler width 
and source sizes for each species and components are read from 
a text file. Einstein coefficients, upper level degeneracy and en- 
ergies as well as the partition functions are taken from spectral 
line catalogs. Because these catalogs usually give the partition 



* (Sub-)millimeter telescopes usually have tapers that limit the power 
received in side-lobes. Because of this, the telescope beam size may be 
different that of a purely diffraction limited antenna of the same di- 
ameter. However, the difference between the two is usually small: at 
100 GHz, the measured FWHM of the IRAM-30m is 24.6", while Eq[3] 
gives 25.2". 



functions at a few temperatures only, the partition function at 
the user temperature is computed from a linear interpolation (or 
extrapolation if the user given temperature is outside the range 
of temperature provided in the catalog). When computing the 
synthetic spectrum, a frequency sampling corresponding to the 
minimum AV divided by 10 is taken (or a frequency sampling 
equal to that of the observed spectra, if it smaller than the min- 
imum Ay divided by 10). This ensures that the sampling at all 
frequencies and for all species and components is sufficient. At 
the end of the computation, the synthetic spectrum is re-sampled 
to the same channel spacing than the observed spectrum in order 
to take the channel dilution factor into account. This allows for 
a direct comparison between the synthetic and observed spectra. 

On Fig.[Tl we show an example of such a modeling. The fig- 
ure shows a spectrum between 524.2 and 525.2 GHz observed 
towards Orion-KL with Hersc hel-HIFI as part of the HEXOS 
guaranteed time key program (Bergin et al. 2010). These data 
have been already presented by Wang et al. (2010). Several 
methanol lines are detected. On this figure we show model pre- 
dictions computed with a Weeds for a single component source 
with A?(CH30H) = 2 X 10'^ cm^^, Qs = 18", T = 80 K and 
Ay = 4 km/s, and using the JPL database. Overall, the model 
predictions are in good agreement with the observations - in 
particular, we reproduce successfully the relative intensity of 
the brightest lines. On the other hand, this simple model un- 
derestimates the small line at 524620 MHz and the shoulder at 
524880 MHz, maybe suggesting several emitting components 
and/or non-LTE excitation. Note that for the given parameters 
the emission is predicted to be optically thin, so that the column 
density and source size can not be constrained independently. 
A complete analysis of the methanol emission in this source 
is clearly beyond the scope of this paper; however this exam- 
ple demonstrates how a simple LTE model is useful to identify 
lines in a spectral survey. Finally, we have crossed-checked these 
model predictions with CASSIS and two packages were found 
to be in excellent agreement. 

4. Conclusions and prospects 

We have presented an extension of the CLASS data reduction 
software for analyzing spectral surveys. This extension allows 
the user to make queries in spectral line databases using a VO 
compliant protocol. It also allows the user to quickly search for 
the various transitions of a given specie. Finally it can com- 
pute model predictions at the LTE, as often needed to iden- 
tify lines in spectra close to the confusion limit. Weeds has al- 
ready been successfully used to analyze par t of the IRAS 16293- 
2422 survey obtained with Herschel-HIFUBac mann et al.l2010l ; 
Hilv-Blant et al. 2010) . and we expect that it will be useful for 
future spectral surveys with this instrument as well. We think 
that it will become a standard tool for analyzing spectral surveys 
obtained with single dish ground based telescopes such as the 
IRAM-30m. Yet, Weeds is not limited to the analysis of single 
dish observations. It may be used to analyze spectral surveys ob- 
tained with interferometers as well, such as the IRAM Plateau 
de Bure, CARMA, the SMA, and the upcoming ALMA and 
eVLA interferometers. In fact, since Weeds is written in Python, 
it could be used from the Python based CAS A software, that will 
be used by the eVLA and ALMA. However, analyzing ALMA 
data will be challenging, because these data will consist in large 
spectral cubes, i.e. essentially a spectral survey on large number 
of pixels. In fact, doing such an analysis by hand, i.e. identify- 
ing the various lines/species on each spectrum of map is prob- 
ably impossible; this will require some automatic fitting tools 
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Fig. 1. Spectra between 524.2 and 525.5 GHz observed towards Orion-KL with Herschel-HIFI (filled histogram) and LTE model 
produced with Weeds (continuous black Une). The rest frequencies of several detected methanol Unes are indicated. 



to extract the relevant information (column densities and exci- 
tation temperature of the various species) as a function of posi- 
tion. Such tools require efficient minimization algorithms to fit a 
model with a large number of free parameters to the data. The de- 
velopment of such tools is already in progress (e.g. in XCLASS 
using the MAGIX minimization framework), and implementing 
these automatic fitting tools in Weeds would be desirable in the 
future. 
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